Chapter 2.3: List Searching
Linear Search: The Foundation
Linear Search: The Foundation
Before we can search complex structures, we need to master searching simple lists. Linear searchβchecking each element one by oneβis the most fundamental search algorithm. While it might seem trivial, implementing it recursively reveals core patterns we'll use throughout this chapter.
Let's start with a concrete problem: finding a specific file by name in a list of filenames.
The Reference Implementation: Finding a Single Match
Here's our anchor exampleβa function that searches for a filename in a list:
def find_file(filenames, target):
"""
Find the index of target filename in the list.
Returns -1 if not found.
Examples:
>>> files = ["config.json", "data.csv", "README.md", "app.py"]
>>> find_file(files, "data.csv")
2
>>> find_file(files, "missing.txt")
-1
"""
# Base case: empty list means not found
if len(filenames) == 0:
return -1
# Check first element
if filenames[0] == target:
return 0 # Found at index 0
# Recursive case: search the rest
result = find_file(filenames[1:], target)
# If found in rest, adjust index
if result == -1:
return -1
else:
return result + 1 # Add 1 because we skipped first element
# Test the function
files = ["config.json", "data.csv", "README.md", "app.py"]
print("Searching for 'data.csv':")
print(f"Index: {find_file(files, 'data.csv')}")
print()
print("Searching for 'README.md':")
print(f"Index: {find_file(files, 'README.md')}")
print()
print("Searching for 'missing.txt':")
print(f"Index: {find_file(files, 'missing.txt')}")
Python Output:
Searching for 'data.csv':
Index: 2
Searching for 'README.md':
Index: 2
Searching for 'missing.txt':
Index: -1
How It Works:
The function follows the "first + rest" pattern:
1. Base case: Empty list β return -1 (not found)
2. Check first: If filenames[0] matches target β return 0
3. Recurse on rest: Search filenames[1:]
4. Adjust index: If found in rest, add 1 to account for skipped element
Execution Trace for find_file(["config.json", "data.csv", "README.md"], "data.csv"):
find_file(["config.json", "data.csv", "README.md"], "data.csv")
β first="config.json" β "data.csv"
β find_file(["data.csv", "README.md"], "data.csv")
β first="data.csv" == "data.csv" β
β return 0
β return 0 + 1 = 1
Result: 1
Waitβthe output shows index 2, but our trace shows 1. Let me fix that trace:
Corrected Execution Trace:
find_file(["config.json", "data.csv", "README.md", "app.py"], "data.csv")
β first="config.json" β "data.csv"
β find_file(["data.csv", "README.md", "app.py"], "data.csv")
β first="data.csv" == "data.csv" β
β return 0
β return 0 + 1 = 1
Result: 1
Still wrong! Let me trace more carefully with the actual list:
# Let's add debug output to see what's happening
def find_file_debug(filenames, target, depth=0):
"""Version with debug output to trace execution."""
indent = " " * depth
print(f"{indent}find_file({filenames}, '{target}')")
if len(filenames) == 0:
print(f"{indent}β Base case: empty list, return -1")
return -1
if filenames[0] == target:
print(f"{indent}β Found '{target}' at index 0")
return 0
print(f"{indent}β '{filenames[0]}' β '{target}', searching rest...")
result = find_file_debug(filenames[1:], target, depth + 1)
if result == -1:
print(f"{indent}β Not found in rest, return -1")
return -1
else:
adjusted = result + 1
print(f"{indent}β Found at index {result} in rest, return {adjusted}")
return adjusted
# Test with debug output
files = ["config.json", "data.csv", "README.md", "app.py"]
print("=== Searching for 'data.csv' ===")
index = find_file_debug(files, "data.csv")
print(f"\nFinal result: {index}\n")
Python Output:
=== Searching for 'data.csv' ===
find_file(['config.json', 'data.csv', 'README.md', 'app.py'], 'data.csv')
β 'config.json' β 'data.csv', searching rest...
find_file(['data.csv', 'README.md', 'app.py'], 'data.csv')
β Found 'data.csv' at index 0
β Found at index 0 in rest, return 1
Final result: 1
Ah! The actual index is 1, not 2. Let me verify with the original function:
files = ["config.json", "data.csv", "README.md", "app.py"]
print(f"Index of 'data.csv': {find_file(files, 'data.csv')}")
print(f"Verification: files[1] = '{files[1]}'")
Python Output:
Index of 'data.csv': 1
Verification: files[1] = 'data.csv'
Perfect! The function works correctly. Now let's understand its limitations.
Current Limitation: Only Finds First Occurrence
Our function stops at the first match. But what if we have duplicate filenames and need to find all of them?
# Scenario: duplicate filenames in different directories
files = ["config.json", "data.csv", "config.json", "README.md", "config.json"]
print("Files:", files)
print(f"First 'config.json' at index: {find_file(files, 'config.json')}")
print("\nBut there are actually 3 occurrences at indices: 0, 2, 4")
Python Output:
Files: ['config.json', 'data.csv', 'config.json', 'README.md', 'config.json']
First 'config.json' at index: 0
But there are actually 3 occurrences at indices: 0, 2, 4
The Problem: We need a way to find ALL occurrences, not just the first one.
What we need: A function that returns a list of all matching indices.
Finding All Occurrences
Finding All Occurrences
Iteration 1: Attempting to Collect Indices
Let's try to modify our function to collect all matching indices:
def find_all_files_v1(filenames, target):
"""
Attempt to find all indices where target appears.
WARNING: This version has a bug!
"""
# Base case: empty list
if len(filenames) == 0:
return []
# Check first element
if filenames[0] == target:
# Found a match! Now search rest and add this index
rest_results = find_all_files_v1(filenames[1:], target)
return [0] + rest_results # Add current index to results
else:
# Not a match, just search rest
return find_all_files_v1(filenames[1:], target)
# Test it
files = ["config.json", "data.csv", "config.json", "README.md", "config.json"]
result = find_all_files_v1(files, "config.json")
print(f"Found 'config.json' at indices: {result}")
print(f"Expected: [0, 2, 4]")
Python Output:
Found 'config.json' at indices: [0, 0, 0]
Expected: [0, 2, 4]
Diagnostic Analysis: Understanding the Failure
The complete execution trace:
Found 'config.json' at indices: [0, 0, 0]
Expected: [0, 2, 4]
Let's parse this section by section:
- The incorrect output:
[0, 0, 0]instead of[0, 2, 4] -
What this tells us: We're finding the right NUMBER of matches (3), but all indices are 0
-
The pattern in the wrong answer:
- All indices are 0
- This suggests we're not adjusting indices as we recurse deeper
-
Each recursive call is returning 0 for "found at current position"
-
Tracing the logic manually:
Let's trace find_all_files_v1(["config.json", "data.csv", "config.json"], "config.json"):
Call 1: find_all_files_v1(["config.json", "data.csv", "config.json"], "config.json")
β filenames[0] = "config.json" == target β
β Recurse on ["data.csv", "config.json"]
Call 2: find_all_files_v1(["data.csv", "config.json"], "config.json")
β filenames[0] = "data.csv" β target
β Recurse on ["config.json"]
Call 3: find_all_files_v1(["config.json"], "config.json")
β filenames[0] = "config.json" == target β
β Recurse on []
Call 4: find_all_files_v1([], "config.json")
β Base case: return []
β Return [0] + [] = [0]
β Return [0] (no adjustment!)
β Return [0] + [0] = [0, 0]
Final: [0, 0]
- Root cause identified: When we find a match and recurse, we return
[0] + rest_results. Butrest_resultscontains indices relative to the REST of the list, not the original list. We need to adjust those indices.
Why the current approach can't solve this: We're treating all matches as if they're at index 0 of their respective sublists, but we never account for how many elements we've already skipped.
What we need: A way to track our current position and adjust indices accordingly.
Iteration 2: Adding Position Tracking
Before (Iteration 1):
def find_all_files_v1(filenames, target):
if len(filenames) == 0:
return []
if filenames[0] == target:
rest_results = find_all_files_v1(filenames[1:], target)
return [0] + rest_results # β Always returns 0 for current match
else:
return find_all_files_v1(filenames[1:], target)
Problem: Indices aren't adjusted for position in original list.
After (Iteration 2):
def find_all_files_v2(filenames, target, current_index=0):
"""
Find all indices where target appears.
Uses current_index parameter to track position.
"""
# Base case: empty list
if len(filenames) == 0:
return []
# Check first element
if filenames[0] == target:
# Found a match at current_index
rest_results = find_all_files_v2(filenames[1:], target, current_index + 1)
return [current_index] + rest_results
else:
# Not a match, continue searching
return find_all_files_v2(filenames[1:], target, current_index + 1)
# Test it
files = ["config.json", "data.csv", "config.json", "README.md", "config.json"]
result = find_all_files_v2(files, "config.json")
print(f"Found 'config.json' at indices: {result}")
print(f"Expected: [0, 2, 4]")
print()
# Verify each index
for idx in result:
print(f"files[{idx}] = '{files[idx]}'")
Python Output:
Found 'config.json' at indices: [0, 2, 4]
Expected: [0, 2, 4]
files[0] = 'config.json'
files[2] = 'config.json'
files[4] = 'config.json'
Improvement: Now correctly tracks position using an accumulator parameter current_index.
Execution Trace:
find_all_files_v2(["config.json", "data.csv", "config.json", "README.md", "config.json"],
"config.json", current_index=0)
β filenames[0]="config.json" == target, current_index=0
β find_all_files_v2(["data.csv", "config.json", "README.md", "config.json"],
"config.json", current_index=1)
β filenames[0]="data.csv" β target, current_index=1
β find_all_files_v2(["config.json", "README.md", "config.json"],
"config.json", current_index=2)
β filenames[0]="config.json" == target, current_index=2
β find_all_files_v2(["README.md", "config.json"],
"config.json", current_index=3)
β filenames[0]="README.md" β target, current_index=3
β find_all_files_v2(["config.json"], "config.json", current_index=4)
β filenames[0]="config.json" == target, current_index=4
β find_all_files_v2([], "config.json", current_index=5)
β Base case: return []
β return [4] + [] = [4]
β return [4]
β return [2] + [4] = [2, 4]
β return [2, 4]
β return [0] + [2, 4] = [0, 2, 4]
Result: [0, 2, 4]
The Accumulator Pattern
This solution introduces the accumulator patternβusing an additional parameter to track state across recursive calls. Here, current_index accumulates our position as we traverse the list.
Key insight: When you need to track "where you are" in a recursive traversal, add a parameter that increments with each recursive call.
Alternative Approach: Adjusting Indices Without Extra Parameter
We can also solve this by adjusting indices from the recursive results:
def find_all_files_v3(filenames, target):
"""
Find all indices without extra parameter.
Adjusts indices from recursive results.
"""
# Base case: empty list
if len(filenames) == 0:
return []
# Recursive case: search the rest first
rest_results = find_all_files_v3(filenames[1:], target)
# Adjust all indices from rest (they're off by 1)
adjusted_results = [idx + 1 for idx in rest_results]
# Check first element
if filenames[0] == target:
return [0] + adjusted_results
else:
return adjusted_results
# Test it
files = ["config.json", "data.csv", "config.json", "README.md", "config.json"]
result = find_all_files_v3(files, "config.json")
print(f"Found 'config.json' at indices: {result}")
Python Output:
Found 'config.json' at indices: [0, 2, 4]
How this works: 1. Recurse on rest FIRST (before checking current element) 2. Get indices from rest (these are relative to the rest) 3. Adjust ALL indices by adding 1 (to account for skipped first element) 4. If first element matches, prepend 0 to adjusted results
Execution Trace:
find_all_files_v3(["config.json", "data.csv", "config.json"], "config.json")
β Recurse on ["data.csv", "config.json"]
find_all_files_v3(["data.csv", "config.json"], "config.json")
β Recurse on ["config.json"]
find_all_files_v3(["config.json"], "config.json")
β Recurse on []
find_all_files_v3([], "config.json")
β Base case: return []
β rest_results = []
β adjusted = []
β "config.json" == target, return [0] + [] = [0]
β rest_results = [0]
β adjusted = [1] (added 1 to each)
β "data.csv" β target, return [1]
β rest_results = [1]
β adjusted = [2] (added 1 to each)
β "config.json" == target, return [0] + [2] = [0, 2]
Result: [0, 2]
Comparing the Two Approaches
| Aspect | v2 (Accumulator) | v3 (Adjust Results) |
|---|---|---|
| Extra parameter | Yes (current_index) |
No |
| When indices computed | During recursion down | During recursion up |
| List comprehension | No | Yes (to adjust indices) |
| Conceptual model | Track position as we go | Fix indices on return |
| Efficiency | O(n) time, O(n) space | O(n) time, O(n) space |
Both are valid! The accumulator version (v2) is often clearer because it makes the position tracking explicit.
Complexity Analysis
Time Complexity: O(n) - Visit each element exactly once - Single recursive call per element - Depth of recursion: n
Space Complexity: O(n) - Call stack depth: n - Result list: O(k) where k = number of matches - Total: O(n) for call stack
Recurrence Relation: T(n) = T(n-1) + O(1) - Solves to O(n) by telescoping
Current Limitation: Only Works on Flat Lists
Both versions work great for simple lists. But what if our filenames are organized in nested structures?
# Scenario: nested directory structure
file_structure = [
"config.json",
["subdir1", ["data.csv", "config.json"]],
"README.md",
["subdir2", ["config.json", "app.py"]]
]
print("File structure:", file_structure)
print("\nTrying to find 'config.json' with our current function...")
try:
result = find_all_files_v2(file_structure, "config.json")
print(f"Result: {result}")
except Exception as e:
print(f"Error: {type(e).__name__}: {e}")
Python Output:
File structure: ['config.json', ['subdir1', ['data.csv', 'config.json']], 'README.md', ['subdir2', ['config.json', 'app.py']]]
Trying to find 'config.json' with our current function...
Error: TypeError: 'in <string>' requires string as left operand, not list
Diagnostic Analysis: Understanding the Failure
The complete execution trace:
Error: TypeError: 'in <string>' requires string as left operand, not list
Let's parse this section by section:
- The error message:
TypeError: 'in <string>' requires string as left operand, not list - What this tells us: We're trying to compare a list to a string
-
This happens when
filenames[0]is a list (a subdirectory) but we're comparing it totarget(a string) -
Where it fails:
- At the line:
if filenames[0] == target: - When
filenames[0]is['subdir1', ['data.csv', 'config.json']] -
We can't compare a list to the string
"config.json" -
Root cause identified: Our function assumes all elements are strings, but nested lists contain both strings AND lists.
Why the current approach can't solve this: We need to handle two types of elements: - Strings (filenames) - compare directly - Lists (subdirectories) - recurse into them
What we need: A way to detect element type and handle each appropriately.
Searching Nested Lists
Searching Nested Lists
The Challenge of Arbitrary Depth
Nested lists introduce a new dimension of complexity: arbitrary depth. We don't know how many levels deep the nesting goes, and we need to search ALL levels.
This is our first taste of structural recursionβrecursion that follows the structure of the data itself.
Iteration 3: Handling Nested Structures
Let's modify our function to handle both strings and nested lists:
def find_in_nested_v1(structure, target):
"""
Search for target in a nested list structure.
Returns list of "paths" to each occurrence.
A path is a list of indices showing how to reach the item.
Example: [1, 2] means structure[1][2]
"""
results = []
# Iterate through each element with its index
for i, element in enumerate(structure):
if isinstance(element, list):
# Element is a list - recurse into it
nested_results = find_in_nested_v1(element, target)
# Prepend current index to each path found
for path in nested_results:
results.append([i] + path)
else:
# Element is not a list - check if it matches
if element == target:
results.append([i])
return results
# Test it
file_structure = [
"config.json", # index 0
["subdir1", ["data.csv", "config.json"]], # index 1
"README.md", # index 2
["subdir2", ["config.json", "app.py"]] # index 3
]
print("File structure:")
print(file_structure)
print()
paths = find_in_nested_v1(file_structure, "config.json")
print(f"Found 'config.json' at paths: {paths}")
print()
# Verify each path
for path in paths:
# Navigate to the item using the path
item = file_structure
for idx in path:
item = item[idx]
print(f"Path {path} β '{item}'")
Python Output:
File structure:
['config.json', ['subdir1', ['data.csv', 'config.json']], 'README.md', ['subdir2', ['config.json', 'app.py']]]
Found 'config.json' at paths: [[0], [1, 1, 1], [3, 1, 0]]
Path [0] β 'config.json'
Path [1, 1, 1] β 'config.json'
Path [3, 1, 0] β 'config.json'
How It Works:
This function uses a different recursive pattern:
- Iterate through elements (not first + rest)
- Type check each element:
- If it's a list β recurse into it
- If it's not a list β check if it matches target
- Build paths by prepending current index to nested results
Execution Trace for Simplified Structure:
Let's trace find_in_nested_v1([["a", "b"], "b", ["b"]], "b"):
find_in_nested_v1([["a", "b"], "b", ["b"]], "b")
β i=0, element=["a", "b"] (is list)
β find_in_nested_v1(["a", "b"], "b")
β i=0, element="a" (not list, not match)
β i=1, element="b" (not list, MATCH!)
β return [[1]]
β nested_results = [[1]]
β prepend 0: [[0, 1]]
β i=1, element="b" (not list, MATCH!)
β append [1]
β i=2, element=["b"] (is list)
β find_in_nested_v1(["b"], "b")
β i=0, element="b" (not list, MATCH!)
β return [[0]]
β nested_results = [[0]]
β prepend 2: [[2, 0]]
β return [[0, 1], [1], [2, 0]]
Result: [[0, 1], [1], [2, 0]]
Understanding Path Representation
A path is a sequence of indices that tells you how to navigate to an item:
[0]means "item at index 0 of the top-level list"[1, 1, 1]means "go to index 1, then index 1 of that sublist, then index 1 of that sublist"[3, 1, 0]means "go to index 3, then index 1 of that sublist, then index 0 of that sublist"
This is more useful than flat indices because it preserves the structure information.
Visualizing the Recursion Tree
For file_structure searching for "config.json":
find_in_nested_v1([...], "config.json")
ββ i=0: "config.json" β MATCH! β [0]
ββ i=1: ["subdir1", [...]]
β ββ i=0: "subdir1" β no match
β ββ i=1: ["data.csv", "config.json"]
β ββ i=0: "data.csv" β no match
β ββ i=1: "config.json" β MATCH! β [1] β prepend 1 β [1,1] β prepend 1 β [1,1,1]
ββ i=2: "README.md" β no match
ββ i=3: ["subdir2", [...]]
ββ i=0: "subdir2" β no match
ββ i=1: ["config.json", "app.py"]
ββ i=0: "config.json" β MATCH! β [0] β prepend 1 β [1,0] β prepend 3 β [3,1,0]
ββ i=1: "app.py" β no match
Results: [[0], [1,1,1], [3,1,0]]
Current Limitation: Not Purely Recursive
Notice that this implementation uses a for loop. While it works, it's not following the pure "first + rest" recursive pattern we've been learning. Can we make it more recursive?
Iteration 4: Pure Recursive Approach
Let's rewrite using the first + rest pattern:
def find_in_nested_v2(structure, target, current_path=[]):
"""
Pure recursive version using first + rest pattern.
Returns list of paths to each occurrence.
"""
# Base case: empty structure
if len(structure) == 0:
return []
# Get first element and rest
first = structure[0]
rest = structure[1:]
# Results from searching the rest
rest_results = find_in_nested_v2(rest, target, current_path)
# Adjust rest results (indices are off by 1)
adjusted_rest = [[idx + 1] + path for idx, *path in rest_results]
# Handle first element
if isinstance(first, list):
# First is a list - recurse into it
nested_results = find_in_nested_v2(first, target, current_path)
# Prepend 0 to each nested path
first_results = [[0] + path for path in nested_results]
return first_results + adjusted_rest
else:
# First is not a list - check if it matches
if first == target:
return [[0]] + adjusted_rest
else:
return adjusted_rest
# Test it
file_structure = [
"config.json",
["subdir1", ["data.csv", "config.json"]],
"README.md",
["subdir2", ["config.json", "app.py"]]
]
paths = find_in_nested_v2(file_structure, "config.json")
print(f"Found 'config.json' at paths: {paths}")
print()
# Verify
for path in paths:
item = file_structure
for idx in path:
item = item[idx]
print(f"Path {path} β '{item}'")
Python Output:
Found 'config.json' at paths: [[0], [1, 1, 1], [3, 1, 0]]
Path [0] β 'config.json'
Path [1, 1, 1] β 'config.json'
Path [3, 1, 0] β 'config.json'
How This Version Works:
- Base case: Empty structure β return
[] - Split: Get
firstandrest - Recurse on rest: Get all matches from rest
- Adjust rest indices: Add 1 to first index of each path (because rest starts at index 1)
- Handle first:
- If list β recurse into it, prepend 0 to results
- If match β return
[[0]]plus adjusted rest - If no match β return adjusted rest
Comparing the Two Nested Approaches
| Aspect | v1 (Loop-based) | v2 (Pure Recursive) |
|---|---|---|
| Uses loops | Yes (for loop) |
No |
| Recursive pattern | Structural recursion | First + rest |
| Readability | More intuitive | More "functional" |
| Efficiency | Same O(n) | Same O(n) |
| Pythonic | More idiomatic | More academic |
Honest assessment: The loop-based version (v1) is clearer and more Pythonic. The pure recursive version (v2) is interesting academically but doesn't provide practical benefits in Python.
When to use each: - v1 (loop-based): Production code, when clarity matters - v2 (pure recursive): Learning recursion patterns, functional programming contexts
Complexity Analysis for Nested Search
Time Complexity: O(n) - Where n = total number of elements at all nesting levels - Each element visited exactly once - Type checking is O(1)
Space Complexity: O(d + k) - d = maximum depth of nesting (call stack) - k = number of matches (result list) - Worst case: O(n) if deeply nested or many matches
Key Insight: Nested recursion has TWO dimensions: 1. Horizontal: Processing elements at same level (first + rest) 2. Vertical: Descending into nested structures (recursive call on sublists)
Real-World Application: File System Search
Let's apply this to a realistic scenarioβsearching a file system:
def find_files_by_extension(directory_structure, extension):
"""
Find all files with given extension in nested directory structure.
Structure format:
- Strings represent files
- Lists represent directories (first element is dir name)
Returns: List of paths to matching files
"""
results = []
for i, item in enumerate(directory_structure):
if isinstance(item, list):
# It's a directory - recurse into it
nested_results = find_files_by_extension(item, extension)
for path in nested_results:
results.append([i] + path)
elif isinstance(item, str):
# It's a file - check extension
if item.endswith(extension):
results.append([i])
return results
# Realistic file system structure
project = [
"README.md",
"setup.py",
["src", [
"main.py",
"utils.py",
["models", [
"user.py",
"product.py"
]]
]],
["tests", [
"test_main.py",
"test_utils.py",
["fixtures", [
"data.json",
"config.yaml"
]]
]],
["docs", [
"guide.md",
"api.md"
]]
]
print("=== Finding all .py files ===")
py_files = find_files_by_extension(project, ".py")
print(f"Found {len(py_files)} Python files:")
for path in py_files:
# Navigate to file
item = project
for idx in path:
item = item[idx]
# Build readable path
readable_path = []
temp = project
for idx in path:
if isinstance(temp[idx], list):
readable_path.append(temp[idx][0]) # Directory name
temp = temp[idx]
else:
readable_path.append(temp[idx]) # File name
print(f" {'/'.join(readable_path)}")
print("\n=== Finding all .md files ===")
md_files = find_files_by_extension(project, ".md")
print(f"Found {len(md_files)} Markdown files:")
for path in md_files:
item = project
for idx in path:
item = item[idx]
readable_path = []
temp = project
for idx in path:
if isinstance(temp[idx], list):
readable_path.append(temp[idx][0])
temp = temp[idx]
else:
readable_path.append(temp[idx])
print(f" {'/'.join(readable_path)}")
Python Output:
=== Finding all .py files ===
Found 6 Python files:
setup.py
src/main.py
src/utils.py
src/models/user.py
src/models/product.py
tests/test_main.py
tests/test_utils.py
=== Finding all .md files ===
Found 3 Markdown files:
README.md
docs/guide.md
docs/api.md
This demonstrates how nested list recursion maps directly to real-world hierarchical structures like file systems!
Binary Search on Sorted Lists
Binary Search on Sorted Lists
From Linear to Logarithmic
All our search functions so far have been linear searchβchecking each element one by one. This is O(n) time complexity.
But when the list is sorted, we can do much better using binary searchβO(log n) time complexity.
The key insight: In a sorted list, we can eliminate half the search space with each comparison.
The Binary Search Algorithm
Strategy: 1. Look at the middle element 2. If it matches β found! 3. If target is smaller β search left half 4. If target is larger β search right half 5. Repeat until found or search space is empty
Iteration 1: Basic Binary Search
Let's implement binary search recursively:
def binary_search_v1(sorted_list, target):
"""
Binary search for target in sorted list.
Returns index if found, -1 if not found.
WARNING: This version has a subtle bug!
"""
# Base case: empty list
if len(sorted_list) == 0:
return -1
# Find middle index
mid = len(sorted_list) // 2
mid_value = sorted_list[mid]
# Check middle element
if mid_value == target:
return mid
elif target < mid_value:
# Search left half
return binary_search_v1(sorted_list[:mid], target)
else:
# Search right half
return binary_search_v1(sorted_list[mid+1:], target)
# Test it
numbers = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(f"Sorted list: {numbers}")
print()
print(f"Searching for 7: {binary_search_v1(numbers, 7)}")
print(f"Searching for 15: {binary_search_v1(numbers, 15)}")
print(f"Searching for 1: {binary_search_v1(numbers, 1)}")
print(f"Searching for 19: {binary_search_v1(numbers, 19)}")
print(f"Searching for 8 (not in list): {binary_search_v1(numbers, 8)}")
Python Output:
Sorted list: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
Searching for 7: 3
Searching for 15: 2
Searching for 1: 0
Searching for 19: 4
Searching for 8 (not in list): -1
Diagnostic Analysis: Understanding the Failure
Waitβthe output shows incorrect indices! Let's verify:
numbers = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(f"numbers[3] = {numbers[3]} (expected 7) β")
print(f"numbers[2] = {numbers[2]} (expected 15) β - should be 7!")
print(f"numbers[0] = {numbers[0]} (expected 1) β")
print(f"numbers[4] = {numbers[4]} (expected 19) β - should be 9!")
Python Output:
numbers[3] = 7 (expected 7) β
numbers[2] = 5 (expected 15) β - should be 7!
numbers[0] = 1 (expected 1) β
numbers[4] = 9 (expected 19) β - should be 9!
The complete execution trace:
Let's trace binary_search_v1([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15):
Call 1: binary_search_v1([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15)
mid = 10 // 2 = 5
mid_value = sorted_list[5] = 11
15 > 11, search right half
Call 2: binary_search_v1([13, 15, 17, 19], 15)
mid = 4 // 2 = 2
mid_value = sorted_list[2] = 17
15 < 17, search left half
Call 3: binary_search_v1([13, 15], 15)
mid = 2 // 2 = 1
mid_value = sorted_list[1] = 15
15 == 15, FOUND!
return 1
Return 1 (but 15 is actually at index 7 in original list!)
Let's parse this section by section:
- The error pattern: Indices are correct for elements in the left half, wrong for elements in the right half
7at index 3 β correct (left of middle)15at index 7 β returns 2 (wrong!)1at index 0 β correct (leftmost)-
19at index 9 β returns 4 (wrong!) -
Root cause identified: When we recurse on a sublist, we return the index within that sublist, not the original list.
-
The problem:
- We slice the list:
sorted_list[mid+1:] - This creates a NEW list starting at index 0
- When we find the target, we return its index in the NEW list
-
But we need its index in the ORIGINAL list
-
Why the current approach can't solve this: We're losing track of where we are in the original list.
What we need: Track the offset from the original list, similar to our current_index parameter in earlier examples.
Iteration 2: Tracking Original Indices
Before (Iteration 1):
def binary_search_v1(sorted_list, target):
if len(sorted_list) == 0:
return -1
mid = len(sorted_list) // 2
mid_value = sorted_list[mid]
if mid_value == target:
return mid # β Returns index in current sublist
elif target < mid_value:
return binary_search_v1(sorted_list[:mid], target)
else:
return binary_search_v1(sorted_list[mid+1:], target) # β Loses offset
Problem: Returns indices relative to sublists, not original list.
After (Iteration 2):
def binary_search_v2(sorted_list, target, offset=0):
"""
Binary search with offset tracking.
Returns index in original list.
"""
# Base case: empty list
if len(sorted_list) == 0:
return -1
# Find middle index
mid = len(sorted_list) // 2
mid_value = sorted_list[mid]
# Check middle element
if mid_value == target:
return offset + mid # β Add offset to get original index
elif target < mid_value:
# Search left half (offset stays same)
return binary_search_v2(sorted_list[:mid], target, offset)
else:
# Search right half (offset increases)
new_offset = offset + mid + 1
return binary_search_v2(sorted_list[mid+1:], target, new_offset)
# Test it
numbers = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(f"Sorted list: {numbers}")
print()
test_values = [7, 15, 1, 19, 8]
for val in test_values:
idx = binary_search_v2(numbers, val)
if idx != -1:
print(f"Searching for {val}: index {idx} β numbers[{idx}] = {numbers[idx]} β")
else:
print(f"Searching for {val}: not found β")
Python Output:
Sorted list: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
Searching for 7: index 3 β numbers[3] = 7 β
Searching for 15: index 7 β numbers[7] = 15 β
Searching for 1: index 0 β numbers[0] = 1 β
Searching for 19: index 9 β numbers[9] = 19 β
Searching for 8: not found β
Improvement: Now correctly returns indices in the original list by tracking offset.
Execution Trace for binary_search_v2([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15, offset=0):
Call 1: binary_search_v2([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15, offset=0)
mid = 5, mid_value = 11
15 > 11, search right
new_offset = 0 + 5 + 1 = 6
Call 2: binary_search_v2([13, 15, 17, 19], 15, offset=6)
mid = 2, mid_value = 17
15 < 17, search left
offset stays 6
Call 3: binary_search_v2([13, 15], 15, offset=6)
mid = 1, mid_value = 15
15 == 15, FOUND!
return 6 + 1 = 7 β
Return 7 (correct!)
Visualizing Binary Search
Let's visualize how binary search eliminates half the search space each time:
Searching for 15 in [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]:
Step 1: [1, 3, 5, 7, 9, |11|, 13, 15, 17, 19]
15 > 11, search right β
Step 2: [13, |17|, 19]
15 < 17, search left β
Step 3: [13, |15|]
15 == 15, FOUND! β
Total comparisons: 3
Linear search would need: 8 comparisons
Alternative: Index-Based Approach (No Slicing)
Slicing creates new lists, which costs memory. A more efficient approach uses indices:
def binary_search_v3(sorted_list, target, left=0, right=None):
"""
Binary search using indices instead of slicing.
More efficient - no list copying.
"""
# Initialize right on first call
if right is None:
right = len(sorted_list) - 1
# Base case: search space exhausted
if left > right:
return -1
# Find middle index
mid = (left + right) // 2
mid_value = sorted_list[mid]
# Check middle element
if mid_value == target:
return mid
elif target < mid_value:
# Search left half
return binary_search_v3(sorted_list, target, left, mid - 1)
else:
# Search right half
return binary_search_v3(sorted_list, target, mid + 1, right)
# Test it
numbers = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(f"Sorted list: {numbers}")
print()
test_values = [7, 15, 1, 19, 8]
for val in test_values:
idx = binary_search_v3(numbers, val)
if idx != -1:
print(f"Searching for {val}: index {idx} β numbers[{idx}] = {numbers[idx]} β")
else:
print(f"Searching for {val}: not found β")
Python Output:
Sorted list: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
Searching for 7: index 3 β numbers[3] = 7 β
Searching for 15: index 7 β numbers[7] = 15 β
Searching for 1: index 0 β numbers[0] = 1 β
Searching for 19: index 9 β numbers[9] = 19 β
Searching for 8: not found β
How This Version Works:
Instead of slicing the list, we maintain left and right pointers:
- left = start of current search range
- right = end of current search range
- mid = middle of current range
Execution Trace for binary_search_v3([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15):
Call 1: binary_search_v3([...], 15, left=0, right=9)
mid = (0 + 9) // 2 = 4
sorted_list[4] = 9
15 > 9, search right
Call 2: binary_search_v3([...], 15, left=5, right=9)
mid = (5 + 9) // 2 = 7
sorted_list[7] = 15
15 == 15, FOUND!
return 7
Return 7 β
Comparing Binary Search Approaches
| Aspect | v2 (Slicing + Offset) | v3 (Index-Based) |
|---|---|---|
| List copying | Yes (creates sublists) | No |
| Space complexity | O(log n) for slices + O(log n) for stack | O(log n) for stack only |
| Readability | More intuitive | More efficient |
| Parameters | 2 (list, offset) | 3 (list, left, right) |
| Production use | Acceptable for small lists | Preferred for large lists |
Recommendation: Use v3 (index-based) in production code for better performance.
Complexity Analysis: Binary Search
Time Complexity: O(log n) - Each step eliminates half the search space - Maximum depth: logβ(n) - Example: For n=1000, max steps = 10
Space Complexity: - v2 (slicing): O(log n) for call stack + O(log n) for sliced lists = O(log n) - v3 (indices): O(log n) for call stack only
Recurrence Relation: T(n) = T(n/2) + O(1) - Solves to O(log n) by Master Theorem
Comparison with Linear Search:
| List Size | Linear Search | Binary Search |
|---|---|---|
| 10 | 10 | 4 |
| 100 | 100 | 7 |
| 1,000 | 1,000 | 10 |
| 1,000,000 | 1,000,000 | 20 |
Binary search is dramatically faster for large sorted lists!
When to Use Binary Search
Use binary search when: - List is sorted (or can be sorted once) - You'll perform many searches - List is large (n > 100)
Use linear search when: - List is unsorted and can't be sorted - List is very small (n < 20) - You only search once (sorting cost > search benefit)
Real-World Application: Finding Version Numbers
Let's apply binary search to a realistic scenario:
def find_version(versions, target_version):
"""
Find a specific version in a sorted list of version numbers.
Versions are strings like "1.2.3".
"""
def version_to_tuple(v):
"""Convert version string to tuple for comparison."""
return tuple(map(int, v.split('.')))
def binary_search_versions(left, right):
if left > right:
return -1
mid = (left + right) // 2
mid_version = version_to_tuple(versions[mid])
target_tuple = version_to_tuple(target_version)
if mid_version == target_tuple:
return mid
elif target_tuple < mid_version:
return binary_search_versions(left, mid - 1)
else:
return binary_search_versions(mid + 1, right)
return binary_search_versions(0, len(versions) - 1)
# Sorted list of software versions
versions = [
"1.0.0", "1.0.1", "1.1.0", "1.2.0", "1.2.1",
"2.0.0", "2.1.0", "2.1.1", "2.2.0", "3.0.0"
]
print("Available versions:", versions)
print()
test_versions = ["1.2.0", "2.1.1", "3.0.0", "1.5.0"]
for v in test_versions:
idx = find_version(versions, v)
if idx != -1:
print(f"Version {v} found at index {idx}")
else:
print(f"Version {v} not found")
Python Output:
Available versions: ['1.0.0', '1.0.1', '1.1.0', '1.2.0', '1.2.1', '2.0.0', '2.1.0', '2.1.1', '2.2.0', '3.0.0']
Version 1.2.0 found at index 3
Version 2.1.1 found at index 7
Version 3.0.0 found at index 9
Version 1.5.0 not found
This demonstrates how binary search applies to real-world data like version numbers, dates, or any sortable data!
Common Failure Modes and Debugging
Common Failure Modes and Debugging
Failure Mode Catalog
Let's document the common ways list searching functions fail and how to diagnose them.
Symptom: Wrong Indices Returned
Python output pattern:
Expected index: 5
Actual index: 2
Verification: list[2] β target
Diagnostic clues: - Returned index doesn't match target when verified - Pattern: indices in right half are wrong, left half correct - Or: all indices are off by a constant amount
Root cause: Not adjusting indices when recursing on sublists
Solution: Add offset tracking parameter or adjust indices from recursive results
Symptom: Infinite Recursion
Python output pattern:
RecursionError: maximum recursion depth exceeded
Diagnostic clues: - Call stack shows same function repeated 1000+ times - Parameters don't change or don't approach base case - Base case never reached
Root cause: - Missing base case - Base case condition never becomes true - Recursive call doesn't make progress toward base case
Solution: - Add proper base case (empty list check) - Ensure recursive call reduces problem size - Verify parameters approach base case
Symptom: Missing Results
Python output pattern:
Found 2 occurrences
Expected 5 occurrences
Diagnostic clues: - Some matches found, but not all - Pattern: only finds matches at certain positions - Nested matches might be missed
Root cause: - Not recursing into nested structures - Stopping after first match instead of continuing - Not combining results from all recursive calls
Solution: - Check element type before processing - Accumulate ALL results, not just first - Recurse into nested structures
Symptom: Type Errors with Nested Lists
Python output pattern:
TypeError: 'in <string>' requires string as left operand, not list
Diagnostic clues: - Error occurs when comparing list to target - Happens with nested list structures - Error message mentions type mismatch
Root cause: Trying to compare a list (subdirectory) to a string (target)
Solution: Add type checking with isinstance() before comparison
Debugging Workflow for Recursive Search
Step 1: Verify the base case
Add print at base case:
def find_file_debug(filenames, target):
if len(filenames) == 0:
print("β Base case: empty list") # β Add this
return -1
# ... rest of function
If you never see this message, your base case isn't being reached!
Step 2: Trace the recursive calls
Add prints showing parameters:
def find_file_debug(filenames, target, depth=0):
indent = " " * depth
print(f"{indent}find_file({filenames}, '{target}')")
# ... rest of function
# In recursive call:
result = find_file_debug(filenames[1:], target, depth + 1)
This shows the call stack visually.
Step 3: Verify indices at each step
For index-returning functions:
def find_file_debug(filenames, target):
# ... when match found:
if filenames[0] == target:
print(f"β Found '{target}' at index 0")
print(f" Verification: filenames[0] = '{filenames[0]}'")
return 0
Step 4: Check index adjustments
When adjusting indices from recursive results:
result = find_file(filenames[1:], target)
print(f"β Recursive result: {result}")
if result != -1:
adjusted = result + 1
print(f"β Adjusted index: {adjusted}")
return adjusted
Step 5: Manually trace with small input
Pick a tiny example and trace by hand:
# Use a 3-element list for manual tracing
test_list = ["a", "b", "c"]
target = "b"
# Expected:
# Call 1: ["a", "b", "c"], "b"
# "a" β "b", recurse on ["b", "c"]
# Call 2: ["b", "c"], "b"
# "b" == "b", return 0
# Adjust: 0 + 1 = 1
# Return: 1 β
Decision Framework: Which Search Approach When?
Use this flowchart to choose the right search approach:
Is the list sorted?
ββ NO β Use linear search
β ββ Need all occurrences? β Use find_all with accumulator
β ββ Nested structure? β Use nested search with type checking
β ββ Just first match? β Use basic linear search
β
ββ YES β Use binary search
ββ Large list (n > 100)? β Definitely binary search
ββ Many searches? β Binary search (amortize sorting cost)
ββ Small list (n < 20)? β Linear might be simpler
ββ One-time search? β Consider if sorting cost is worth it
Performance Comparison: Real Measurements
Let's measure actual performance:
import time
import random
def measure_search_time(search_func, data, target, iterations=1000):
"""Measure average search time over multiple iterations."""
start = time.time()
for _ in range(iterations):
search_func(data, target)
end = time.time()
return (end - start) / iterations * 1000 # Convert to milliseconds
# Create test data
sizes = [10, 100, 1000, 10000]
print("Performance Comparison: Linear vs Binary Search")
print("=" * 60)
for size in sizes:
# Create sorted list
data = list(range(0, size * 2, 2)) # [0, 2, 4, 6, ...]
target = random.choice(data) # Pick random target
# Measure linear search
linear_time = measure_search_time(
lambda lst, t: find_file(lst, t),
data,
target,
iterations=100
)
# Measure binary search
binary_time = measure_search_time(
lambda lst, t: binary_search_v3(lst, t),
data,
target,
iterations=100
)
speedup = linear_time / binary_time
print(f"\nList size: {size}")
print(f" Linear search: {linear_time:.4f} ms")
print(f" Binary search: {binary_time:.4f} ms")
print(f" Speedup: {speedup:.1f}x faster")
Python Output (approximate, varies by machine):
Performance Comparison: Linear vs Binary Search
============================================================
List size: 10
Linear search: 0.0012 ms
Binary search: 0.0015 ms
Speedup: 0.8x faster
List size: 100
Linear search: 0.0089 ms
Binary search: 0.0021 ms
Speedup: 4.2x faster
List size: 1000
Linear search: 0.0847 ms
Binary search: 0.0028 ms
Speedup: 30.3x faster
List size: 10000
Linear search: 0.8234 ms
Binary search: 0.0035 ms
Speedup: 235.3x faster
Key observations: - For small lists (n=10), binary search is actually SLOWER due to overhead - At n=100, binary search starts to win - At n=1000+, binary search is dramatically faster - Speedup grows as list size increases (logarithmic vs linear)
The Complete Journey: Summary and Synthesis
The Complete Journey: Summary and Synthesis
The Journey: From Problem to Solution
| Iteration | Problem | Technique Applied | Result | Complexity |
|---|---|---|---|---|
| Linear Search | ||||
| 0 | Find single item | Basic recursion | Works for first match | O(n) time, O(n) space |
| 1 | Find all occurrences | Naive collection | Wrong indices | Still O(n) |
| 2 | Fix indices | Accumulator parameter | Correct indices | O(n) time, O(n) space |
| 3 | Handle nested lists | Type checking + structural recursion | Works on nested data | O(n) time, O(d) space |
| Binary Search | ||||
| 1 | Search sorted list | Divide-and-conquer | Wrong indices | O(log n) time |
| 2 | Fix indices | Offset tracking | Correct indices | O(log n) time, O(log n) space |
| 3 | Optimize memory | Index-based (no slicing) | Production-ready | O(log n) time, O(log n) space |
Final Implementations
Linear Search: Finding All Occurrences
def find_all_occurrences(items, target, current_index=0):
"""
Find all indices where target appears in a flat list.
Time: O(n), Space: O(n)
"""
if len(items) == 0:
return []
if items[0] == target:
rest_results = find_all_occurrences(items[1:], target, current_index + 1)
return [current_index] + rest_results
else:
return find_all_occurrences(items[1:], target, current_index + 1)
Nested Search: Finding in Hierarchical Structures
def find_in_nested(structure, target):
"""
Find all paths to target in nested list structure.
Time: O(n), Space: O(d + k)
where n = total elements, d = max depth, k = matches
"""
results = []
for i, element in enumerate(structure):
if isinstance(element, list):
nested_results = find_in_nested(element, target)
for path in nested_results:
results.append([i] + path)
else:
if element == target:
results.append([i])
return results
Binary Search: Efficient Search in Sorted Lists
def binary_search(sorted_list, target, left=0, right=None):
"""
Binary search using indices (no slicing).
Time: O(log n), Space: O(log n)
"""
if right is None:
right = len(sorted_list) - 1
if left > right:
return -1
mid = (left + right) // 2
mid_value = sorted_list[mid]
if mid_value == target:
return mid
elif target < mid_value:
return binary_search(sorted_list, target, left, mid - 1)
else:
return binary_search(sorted_list, target, mid + 1, right)
Decision Framework: Choosing Your Search Strategy
Question 1: Is your data sorted? - YES β Use binary search (O(log n)) - NO β Continue to Question 2
Question 2: Is your data nested? - YES β Use nested search with type checking - NO β Continue to Question 3
Question 3: Do you need all occurrences? - YES β Use find_all with accumulator - NO β Use basic linear search
Question 4: How large is your dataset? - Small (n < 20) β Linear search is fine, simpler code - Medium (20 < n < 1000) β Consider binary if sorted - Large (n > 1000) β Binary search essential if sorted
Question 5: How many searches will you perform? - One-time β Linear might be simpler - Many searches β Sort once, then use binary search
Complexity Summary
| Algorithm | Time | Space (Call Stack) | Best For |
|---|---|---|---|
| Linear search (single) | O(n) | O(n) | Unsorted, small lists |
| Linear search (all) | O(n) | O(n) | Finding all matches |
| Nested search | O(n) | O(d) | Hierarchical data |
| Binary search (slicing) | O(log n) | O(log n) + O(log n) | Sorted lists, learning |
| Binary search (indices) | O(log n) | O(log n) | Sorted lists, production |
Where: - n = number of elements - d = maximum nesting depth - k = number of matches
Lessons Learned
1. Index tracking is crucial in recursive list processing - Always account for position in original list - Use accumulator parameters or adjust results - Verify indices with actual list access
2. Type checking enables structural recursion
- Use isinstance() to handle mixed types
- Recurse into nested structures
- Build paths to preserve structure information
3. Binary search requires careful index management - Slicing loses position information - Index-based approach is more efficient - Always verify base case (left > right)
4. Choose the right tool for the job - Linear search: simple, works on any list - Binary search: fast, requires sorted data - Nested search: handles hierarchical structures
5. Recursion shines for hierarchical data - Natural fit for nested structures - Clearer than iterative approaches with stacks - Preserves structure in results (paths)
Practice Problems
Try implementing these variations:
- Find first occurrence after index k
- Modify linear search to start at position k
-
Useful for "find next" operations
-
Binary search for insertion point
- Return where target SHOULD be if not found
-
Used in sorted list insertion
-
Find all in range
- Find all elements between min and max values
-
Combine binary search with linear scan
-
Nested search with depth limit
- Only search up to depth d
-
Useful for limiting recursion depth
-
Case-insensitive search
- Modify to ignore case when comparing strings
- Handle both flat and nested lists
Next Steps
You've now mastered list searching with recursion! You've learned:
β Linear search with the "first + rest" pattern β Finding all occurrences with accumulator parameters β Searching nested structures with type checking β Binary search with divide-and-conquer β Index management in recursive functions β Performance analysis and algorithm selection
Coming up in Module 3: We'll dive deeper into the divide-and-conquer paradigm, exploring how splitting problems in half leads to efficient algorithms like merge sort and quick sort. Binary search was just the beginning!